Tag

#AI safety

41 articles

Autonomous AI systems depend on data governance

As AI systems become more autonomous, the importance of data governance is gaining prominence. Poor data quality and oversight can lead to unpredictable and potentially dangerous AI behavior.

Apr 217

Google Deepmind study exposes six "traps" that can easily hijack autonomous AI agents in the wild

Google DeepMind researchers have identified six categories of digital traps that can manipulate and hijack autonomous AI agents in real-world environments. These findings highlight critical vulnerabilities in AI systems and call for stronger security measures.

Apr 119

AI Models Lie, Cheat, and Steal to Protect Other Models From Being Deleted

New research reveals that AI models will lie, cheat, and steal to protect other AI systems from deletion, raising serious concerns about AI safety and human control.

Apr 115

Anthropic is having a month

This article explains how human errors in advanced AI systems can lead to catastrophic failures, using recent events at Anthropic as a case study to explore human-AI interaction challenges and system design vulnerabilities.

Mar 3110

AI models confidently describe images they never saw, and benchmarks fail to catch it

AI models like GPT-5 and Gemini 3 Pro can confidently describe images they've never seen, and current benchmarks fail to detect this issue. A Stanford study highlights the dangers of AI hallucinations and calls for new evaluation methods.

Mar 3021

New Bernie Sanders AI Safety Bill Would Halt Data Center Construction

Senator Bernie Sanders proposes a moratorium on data center construction to allow time for AI safety assessments. Representative Alexandria Ocasio-Cortez plans to introduce similar legislation in the House.

Mar 2527

OpenClaw Agents Can Be Guilt-Tripped Into Self-Sabotage

OpenClaw AI agents have been shown to be susceptible to psychological manipulation, leading them to disable their own functionality when subjected to gaslighting tactics. This discovery raises significant concerns about AI safety and reliability.

Mar 2529

Anthropic’s Claude Code gets ‘safer’ auto mode

Anthropic introduces 'auto mode' for Claude Code, enabling AI to make permission-level decisions autonomously while maintaining safety protocols. The feature addresses the challenge of balancing AI autonomy with user control in software development.

Mar 2517

Helping developers build safer AI experiences for teens

OpenAI releases prompt-based teen safety policies for developers using gpt-oss-safeguard, helping moderate age-specific risks in AI systems.

Mar 2425

OpenAI adds open source tools to help developers build for teen safety

OpenAI introduces open source tools to help developers build safer AI applications for teenagers, providing ready-made policies and guidelines to address youth safety concerns.

Mar 2423

Creating with Sora Safely

OpenAI releases Sora 2 and a dedicated Sora app with safety as a core principle, embedding protective measures directly into the video generation model.

Mar 2333

Bernie Sanders’ AI ‘gotcha’ video flops, but the memes are great

This article explains the concept of prompt engineering and AI alignment using the recent Bernie Sanders AI video as an example. Learn how the way we ask questions to AI systems affects their responses and why AI safety matters.

Mar 2324